Reduce cyclomatic complexity of try_merge_additional_files()#292
Conversation
Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
|
Important Review skippedThis PR was authored by the user configured for CodeRabbit reviews. By default, CodeRabbit skips reviewing PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback. To trigger a single review, invoke the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR refactors the survey-file merging workflow by extracting multiple helper functions from try_merge_additional_files(), centralizing merge logic and warnings, and fixes incorrect unmatched-merge warning counts when duplicate keys are present. The main function now delegates to the new helpers and returns an observation_key with merged surveys. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/load-survey-utils.R`:
- Around line 253-279: The loop block around get_mergeable_files /
try_merge_one_file has lint/style violations; run lintr::lint_package(), fix any
flagged issues in this block (e.g., spacing, assignment style, missing commas,
line-wrapping, unused variables, and NULL vs logical checks) — specifically
review the usages of get_mergeable_files, try_merge_one_file,
main_surveys[[type]], merged_files and setdiff and adjust code to conform to
lintr rules (fix indentation, ensure commas and parentheses are correct, prefer
consistent assignment operators, and avoid constructing NULL vectors
incorrectly); after fixing, run devtools::document() to regenerate roxygen docs
and re-run lintr until no warnings remain before committing.
- Around line 215-230: The unmatched-merge warning is wrong when the main table
has duplicate keys because matched_main counts join pairs; change the logic to
count unique matched IDs so unmatched_merge uses the number of distinct matched
..merge_id values rather than row pairs. Specifically, compute
matched_main_unique as the number of unique non-NA values in
merged[["..merge_id"]] (using unique/na.omit) and use that value when computing
unmatched_merge (nrow(contact_data[[file]]) - matched_main_unique) and in the
related warnings (replace uses of matched_main with matched_main_unique where
indicating unique matched rows).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
NEWS.mdR/globals.RR/load-survey-utils.Rman/get_mergeable_files.Rdman/resolve_longitudinal_key.Rdman/try_merge_one_file.Rd
Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
R/load-survey-utils.R (1)
254-288: Verify lint compliance after the formatting fix commit.The prior CI failures from
air/reviewdog around this block were reported in an earlier review and the latest commit claims to have fixed them. Please confirmlintr::lint_package()still passes cleanly for this file.#!/bin/bash # Verify lintr passes for the modified file rg -n "get_mergeable_files|try_merge_one_file|merged_files" R/load-survey-utils.R | head -40As per coding guidelines: "Run
lintr::lint_package()to check for style issues and fix any in files you modified before committing."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/load-survey-utils.R` around lines 254 - 288, Run lintr::lint_package() and address any lint errors in the updated merge loop area: inspect usage of get_mergeable_files, try_merge_one_file, and the merged_files variable handling in the merge loop and fix style issues (e.g., spacing, line length, assignment consistency, NULL checks) so that lintr reports no issues for this region; re-run lintr::lint_package() and commit the formatting/style fixes.
🧹 Nitpick comments (2)
R/load-survey-utils.R (2)
135-146:resolve_longitudinal_keysilently discards an explicitly providedparticipant_keywithout warning.If
participant_keyis non-NULL but either has columns missing frommergedor doesn't uniquely identify rows, the function falls through tofind_unique_key()with no diagnostic. The user gets a "Detected longitudinal data…" message citing an auto-detected key with no indication their explicit key was rejected.Consider emitting a
cli::cli_warn()before falling through:♻️ Proposed fix
resolve_longitudinal_key <- function(merged, participant_key = NULL) { if (!is.null(participant_key)) { missing_cols <- setdiff(participant_key, names(merged)) if ( length(missing_cols) == 0 && anyDuplicated(merged, by = participant_key) == 0L ) { return(participant_key) + } else if (length(missing_cols) > 0) { + cli::cli_warn( + "Provided {.arg participant_key} column{?s} not found in merged data: \\ + {.val {missing_cols}}. Attempting auto-detection." + ) + } else { + cli::cli_warn( + "Provided {.arg participant_key} does not uniquely identify rows. \\ + Attempting auto-detection." + ) } } find_unique_key(merged, "part_id") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/load-survey-utils.R` around lines 135 - 146, resolve_longitudinal_key currently ignores a non-NULL participant_key if columns are missing or it is not unique and falls through to find_unique_key without notifying the user; update resolve_longitudinal_key to detect these two failure modes (missing_cols != NULL/length > 0 and anyDuplicated(merged, by = participant_key) != 0L) and emit a cli::cli_warn() that includes the provided participant_key and the reason (missing columns list or non-unique key) before calling find_unique_key, so callers see why their explicit key was rejected.
162-235:..merge_idis not cleaned fromcontact_data[[file]]on early return.
data.table's:=at Line 163 mutates the table in-place through R's reference semantics. Whentry_merge_one_filereturns early (Lines 191, 199, or 212),contact_data[[file]]in the caller retains the..merge_idcolumn. Line 235 only removes it from the freshly-createdmergedtable.In practice this is harmless — successfully-merged files are dropped from
survey_files, and failed files have..merge_idoverwritten on the next retry — but it is a subtle side effect worth guarding against:♻️ Proposed cleanup on early returns
if (is.null(merged)) { + contact_data[[file]][, ("..merge_id") := NULL] return(list(merged = NULL, detected_key = NULL, file = NULL)) } has_duplicates <- anyDuplicated(merged[, "..main_id", with = FALSE]) > 0 accept_merge <- !has_duplicates detected_key <- NULL if (has_duplicates && type == "contact") { + contact_data[[file]][, ("..merge_id") := NULL] return(list(merged = NULL, detected_key = NULL, file = NULL)) } ... if (!accept_merge) { + contact_data[[file]][, ("..merge_id") := NULL] return(list(merged = NULL, detected_key = NULL, file = NULL)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/load-survey-utils.R` around lines 162 - 235, The code adds a temporary ..merge_id column to contact_data[[file]] in try_merge_one_file but returns early in several places (when merged is NULL, when has_duplicates for type "contact", and when !accept_merge) without removing that column, leaving a mutated contact_data; fix by avoiding in-place mutation or ensuring cleanup: either operate on a copy (use data.table::copy(contact_data[[file]]) before assigning ..merge_id) or register a cleanup with on.exit({ contact_data[[file]][, "..merge_id" := NULL] }, add = TRUE) immediately after creating ..merge_id so the temporary column is removed regardless of which early return path (references: contact_data[[file]], ..merge_id, try_merge_one_file, merged, accept_merge).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@R/load-survey-utils.R`:
- Around line 254-288: Run lintr::lint_package() and address any lint errors in
the updated merge loop area: inspect usage of get_mergeable_files,
try_merge_one_file, and the merged_files variable handling in the merge loop and
fix style issues (e.g., spacing, line length, assignment consistency, NULL
checks) so that lintr reports no issues for this region; re-run
lintr::lint_package() and commit the formatting/style fixes.
---
Nitpick comments:
In `@R/load-survey-utils.R`:
- Around line 135-146: resolve_longitudinal_key currently ignores a non-NULL
participant_key if columns are missing or it is not unique and falls through to
find_unique_key without notifying the user; update resolve_longitudinal_key to
detect these two failure modes (missing_cols != NULL/length > 0 and
anyDuplicated(merged, by = participant_key) != 0L) and emit a cli::cli_warn()
that includes the provided participant_key and the reason (missing columns list
or non-unique key) before calling find_unique_key, so callers see why their
explicit key was rejected.
- Around line 162-235: The code adds a temporary ..merge_id column to
contact_data[[file]] in try_merge_one_file but returns early in several places
(when merged is NULL, when has_duplicates for type "contact", and when
!accept_merge) without removing that column, leaving a mutated contact_data; fix
by avoiding in-place mutation or ensuring cleanup: either operate on a copy (use
data.table::copy(contact_data[[file]]) before assigning ..merge_id) or register
a cleanup with on.exit({ contact_data[[file]][, "..merge_id" := NULL] }, add =
TRUE) immediately after creating ..merge_id so the temporary column is removed
regardless of which early return path (references: contact_data[[file]],
..merge_id, try_merge_one_file, merged, accept_merge).
Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
R/load-survey-utils.R (2)
118-124:vapplyFUN.VALUE— preferlogical(1)overTRUE.
TRUEworks as a template value butlogical(1)is the idiomatic form that makes the expected return type explicit.♻️ Proposed refactor
can_merge <- vapply( survey_files, function(x) { any(colnames(contact_data[[x]]) %in% main_cols) }, - TRUE + logical(1) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/load-survey-utils.R` around lines 118 - 124, The vapply call that computes can_merge uses TRUE as the FUN.VALUE template; change that to logical(1) so the expected return type is explicit — update the vapply over survey_files (the anonymous function inspecting colnames(contact_data[[x]]) %in% main_cols) to use FUN.VALUE = logical(1) instead of TRUE.
160-165:..merge_idadded tocontact_data[[file]]in place but never cleaned up.
contact_data[[file]][, ..merge_id := seq_len(.N)]uses data.table reference semantics — it modifies the table in place even without the outer<-assignment. Aftertry_merge_one_filereturns,..merge_idpersists on the original data.table. It's overwritten safely if the file is retried, and the side-effect is contained withinjoin_possible_files, so this is not a correctness bug. Consider addingon.exit(contact_data[[file]][, ("..merge_id") := NULL])or an explicit cleanup afterwarn_merge_qualityfor hygiene.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/load-survey-utils.R` around lines 160 - 165, The temporary column "..merge_id" is added in place to contact_data[[file]] (via contact_data[[file]][, ..merge_id := seq_len(.N)]) and is never removed; update try_merge_one_file (or the caller join_possible_files) to always clean up that column after use—either add on.exit(contact_data[[file]][, ("..merge_id") := NULL]) at the start of try_merge_one_file or explicitly remove ("..merge_id") := NULL immediately after warn_merge_quality/merge logic so the side-effect on contact_data[[file]] is eliminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/load-survey-utils.R`:
- Around line 262-318: Initialize merged_files as character(0) (not NULL) in the
merge loop inside try_merge_one_file flow (symbol: merged_files) and change the
break check to test its length (e.g., if (length(merged_files) == 0) break) so
the grow-with-c() pattern doesn't trigger vector_initialize_linter and avoids an
infinite loop; in inform_longitudinal_key replace the escaped double-quote
construction that builds key_code (currently paste0("\"", detected_key, "\"",
...)) with single-quote usage (e.g., paste0("'", detected_key, "'", collapse =
", ")) to satisfy quotes_linter, then run lintr::lint_package() and the air
formatter to confirm no remaining style issues.
- Around line 133-143: resolve_longitudinal_key currently ignores a
user-supplied participant_key when it's invalid without notifying the caller;
modify resolve_longitudinal_key to validate participant_key (using missing_cols
<- setdiff(participant_key, names(merged)) and anyDuplicated(merged, by =
participant_key)) and if invalid emit a clear warning message stating which
columns are missing or that the key is not unique and that participant_key will
be ignored, then fall back to find_unique_key(merged, "part_id"); keep the
original return behavior but ensure the warning references participant_key,
missing_cols and the uniqueness check so users know why their key was discarded.
---
Nitpick comments:
In `@R/load-survey-utils.R`:
- Around line 118-124: The vapply call that computes can_merge uses TRUE as the
FUN.VALUE template; change that to logical(1) so the expected return type is
explicit — update the vapply over survey_files (the anonymous function
inspecting colnames(contact_data[[x]]) %in% main_cols) to use FUN.VALUE =
logical(1) instead of TRUE.
- Around line 160-165: The temporary column "..merge_id" is added in place to
contact_data[[file]] (via contact_data[[file]][, ..merge_id := seq_len(.N)]) and
is never removed; update try_merge_one_file (or the caller join_possible_files)
to always clean up that column after use—either add
on.exit(contact_data[[file]][, ("..merge_id") := NULL]) at the start of
try_merge_one_file or explicitly remove ("..merge_id") := NULL immediately after
warn_merge_quality/merge logic so the side-effect on contact_data[[file]] is
eliminated.
Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
R/load-survey-utils.R (2)
167-167: Inconsistent:=style for..-prefixed column names.Line 167 uses the bare-symbol form (
..merge_id := ...) while lines 209 and 213 use the explicit parenthesized-string form (("..main_id") := ...). Both are functionally correct — in data.table's:=context, a bare LHS symbol is taken as a literal column name, so..merge_idcreates the column"..merge_id". However, readers familiar with data.table's..parent-frame lookup convention for column selection may find the bare form confusing. Line 361 has the same pattern.♻️ Suggested fix — use the parenthesized form consistently
- contact_data[[file]] <- contact_data[[file]][, - ..merge_id := seq_len(.N) - ] + contact_data[[file]] <- contact_data[[file]][, + ("..merge_id") := seq_len(.N) + ]And at line 361:
- main_surveys[[type]] <- result$merged[, ..main_id := NULL] + main_surveys[[type]] <- result$merged[, ("..main_id") := NULL]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/load-survey-utils.R` at line 167, The code mixes two styles for creating columns named with a leading ".." (bare-symbol form like ..merge_id := seq_len(.N) vs the explicit parenthesized-string form like ("..main_id") := ...); update the bare uses (e.g., the ..merge_id assignment and the similar occurrence around line 361) to the consistent parenthesized-string form so they become ("..merge_id") := seq_len(.N) to match ("..main_id") := ... elsewhere, keeping the same RHS logic and ensuring consistent readability across the file.
324-328: Single-elementdetected_keyemitsc("x")instead of"x"in the suggested call.When
length(detected_key) == 1,key_codebecomesc("sday")rather than the more natural"sday". This is valid R but slightly noisier than necessary in the user-facing message.♻️ Optional fix
- key_code <- paste0( - "c(", - paste0('"', detected_key, '"', collapse = ", "), - ")" - ) + key_code <- if (length(detected_key) == 1L) { + paste0('"', detected_key, '"') + } else { + paste0( + "c(", + paste0('"', detected_key, '"', collapse = ", "), + ")" + ) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/load-survey-utils.R` around lines 324 - 328, The generated key_code currently always wraps detected_key in c(...), producing c("x") for single-element vectors; change the construction of key_code to format as a plain quoted string when length(detected_key) == 1 and only use the c("...","...") form when length > 1 — i.e., inspect detected_key, and set key_code to '"' + detected_key + '"' for a single value and to 'c("a","b",...)' (using the existing paste0 collapse logic) otherwise so the user-facing suggestion shows "sday" instead of c("sday").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/load-survey-utils.R`:
- Around line 133-145: In resolve_longitudinal_key, distinguish the two failure
modes: check missing_cols (setdiff(participant_key, names(merged))) first and if
non-empty emit a warning that names in participant_key are missing and list
missing_cols; otherwise (when all columns exist but anyDuplicated(..., by =
participant_key) != 0L) emit the existing warning that the provided
participant_key does not uniquely identify rows and that you will auto-detect a
key; ensure messages reference the variable participant_key and include the
missing column names when applicable.
---
Nitpick comments:
In `@R/load-survey-utils.R`:
- Line 167: The code mixes two styles for creating columns named with a leading
".." (bare-symbol form like ..merge_id := seq_len(.N) vs the explicit
parenthesized-string form like ("..main_id") := ...); update the bare uses
(e.g., the ..merge_id assignment and the similar occurrence around line 361) to
the consistent parenthesized-string form so they become ("..merge_id") :=
seq_len(.N) to match ("..main_id") := ... elsewhere, keeping the same RHS logic
and ensuring consistent readability across the file.
- Around line 324-328: The generated key_code currently always wraps
detected_key in c(...), producing c("x") for single-element vectors; change the
construction of key_code to format as a plain quoted string when
length(detected_key) == 1 and only use the c("...","...") form when length > 1 —
i.e., inspect detected_key, and set key_code to '"' + detected_key + '"' for a
single value and to 'c("a","b",...)' (using the existing paste0 collapse logic)
otherwise so the user-facing suggestion shows "sday" instead of c("sday").
…:= style Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
R/load-survey-utils.R (1)
293-295: Conflictingdetected_keyvalues from different files are silently overwritten.If two separate survey files processed within the same for-loop pass both produce a non-
NULLdetected_key, the second silently replaces the first without any diagnostic. In practice a single type has at most one longitudinal file, but a guard would catch unexpected cases early.💡 Optional guard
if (!is.null(result$detected_key)) { + if (!is.null(detected_key) && !setequal(detected_key, result$detected_key)) { + cli::cli_warn( + "Multiple files produced conflicting longitudinal keys; \\ + using key from {.file {basename(file)}}.", + call = call + ) + } detected_key <- result$detected_key }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/load-survey-utils.R` around lines 293 - 295, The code silently overwrites an existing detected_key when result$detected_key is non-NULL; update the block around detected_key assignment to check if detected_key is already set before overwriting (i.e., test existing detected_key variable), and if it is set then emit a diagnostic (warning or stop) that names the conflicting files/variables (or append to a list) so conflicts are surfaced; reference the variables result$detected_key and detected_key and perform the guard/diagnostic instead of unconditional reassignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/load-survey-utils.R`:
- Around line 381-386: The warning uses the full path for {.file {file}} but
other messages use basename(file), causing inconsistent UX; update the
cli::cli_warn call in the loop to use basename(file) (i.e., {.file
{basename(file)}}) so it matches warn_merge_quality and the abort in
try_merge_one_file and preserves the same shortened filename presentation.
- Around line 133-155: Add a call parameter to resolve_longitudinal_key (e.g.
call = rlang::caller_env()), pass that call into both cli::cli_warn calls so
warnings report the correct caller, and ensure the function still calls
find_unique_key as before; then update the call site in try_merge_one_file to
forward its call argument into resolve_longitudinal_key (so try_merge_one_file
-> resolve_longitudinal_key receives the same call), matching the pattern used
by other helpers like inform_longitudinal_key.
---
Nitpick comments:
In `@R/load-survey-utils.R`:
- Around line 293-295: The code silently overwrites an existing detected_key
when result$detected_key is non-NULL; update the block around detected_key
assignment to check if detected_key is already set before overwriting (i.e.,
test existing detected_key variable), and if it is set then emit a diagnostic
(warning or stop) that names the conflicting files/variables (or append to a
list) so conflicts are surfaced; reference the variables result$detected_key and
detected_key and perform the guard/diagnostic instead of unconditional
reassignment.
…istently Co-authored-by: sbfnk <sebastian.funk@lshtm.ac.uk>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
R/load-survey-utils.R (2)
179-181: data.table reference semantics:..merge_idis added in-place to the caller's data.
:=mutates the originalcontact_data[[file]]data.table before the list-level copy-on-modify kicks in, so the caller'scontact_data[[file]]permanently gains the..merge_idcolumn. After a successful merge, onlymerged[, ("..merge_id") := NULL](line 226) cleans up the merged result; the source data.table is never cleaned. This is contained within the current flow (the file is dropped fromsurvey_fileson success), but it is an invisible side-effect on the caller'scontact_data.To avoid mutating the caller's data, copy before adding the tracking column:
♻️ Proposed fix
- contact_data[[file]] <- contact_data[[file]][, - ("..merge_id") := seq_len(.N) - ] + file_data <- data.table::copy(contact_data[[file]])[, + ("..merge_id") := seq_len(.N) + ]Then replace subsequent references to
contact_data[[file]]within this function withfile_data.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/load-survey-utils.R` around lines 179 - 181, The code mutates the caller's data.table by adding ("..merge_id") with := on contact_data[[file]]; to fix, make a copy first (e.g., assign file_data <- copy(contact_data[[file]])) then perform file_data[, ("..merge_id") := seq_len(.N)] and use file_data for all subsequent operations in this function (including merging) so the original contact_data[[file]] is not modified; keep the existing cleanup merged[, ("..merge_id") := NULL] as-is after merge and ensure any later references that previously used contact_data[[file]] in this function are switched to file_data.
402-408: Addcallparameter tojoin_possible_filesfor consistent warning attribution.Every helper function in this file accepts
call = rlang::caller_env()and threads it through tocli_warn/cli_abort.join_possible_filesbreaks that chain: when it callstry_merge_additional_fileswithout passingcall, the defaultrlang::caller_env()capturesjoin_possible_files's own environment, attributing all warnings tojoin_possible_filesrather than the user-facing caller (load_survey()).♻️ Proposed fix
join_possible_files <- function( survey_files, contact_data, main_types, main_surveys, - participant_key = NULL + participant_key = NULL, + call = rlang::caller_env() ) { survey_contact_data <- join_compatible_files(survey_files, contact_data) contact_data <- survey_contact_data$contact_data survey_files <- survey_contact_data$survey_files result <- try_merge_additional_files( main_types, main_surveys, survey_files, contact_data, - participant_key = participant_key + participant_key = participant_key, + call = call )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/load-survey-utils.R` around lines 402 - 408, The join_possible_files function should accept a call = rlang::caller_env() parameter and thread it into any downstream warning/abort invocations so warnings are attributed to the original caller; update join_possible_files signature to include call, pass call into try_merge_additional_files where it is invoked, and ensure any cli_warn/cli_abort calls inside join_possible_files (or calls it makes) use that call parameter (consistent with other helpers and with load_survey()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@R/load-survey-utils.R`:
- Around line 179-181: The code mutates the caller's data.table by adding
("..merge_id") with := on contact_data[[file]]; to fix, make a copy first (e.g.,
assign file_data <- copy(contact_data[[file]])) then perform file_data[,
("..merge_id") := seq_len(.N)] and use file_data for all subsequent operations
in this function (including merging) so the original contact_data[[file]] is not
modified; keep the existing cleanup merged[, ("..merge_id") := NULL] as-is after
merge and ensure any later references that previously used contact_data[[file]]
in this function are switched to file_data.
- Around line 402-408: The join_possible_files function should accept a call =
rlang::caller_env() parameter and thread it into any downstream warning/abort
invocations so warnings are attributed to the original caller; update
join_possible_files signature to include call, pass call into
try_merge_additional_files where it is invoked, and ensure any
cli_warn/cli_abort calls inside join_possible_files (or calls it makes) use that
call parameter (consistent with other helpers and with load_survey()).
Closes #289.
Extract three focused helpers from
try_merge_additional_files()to bring cyclomatic complexity from 37 down to under 15:get_mergeable_files()— identifies which survey files share columns with the main table (was duplicated inline)resolve_longitudinal_key()— validates a user-providedparticipant_keyor auto-detects viafind_unique_key()try_merge_one_file()— handles merging a single file including error handling, duplicate detection, and match quality warningsThe outer function becomes a simple orchestrator. All existing tests pass unchanged.
Summary by CodeRabbit